Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: retry build processing on failed server comms #594

Merged
merged 2 commits into from
Aug 26, 2024

Conversation

ecrupper
Copy link
Contributor

Similar to our server compilation retry loop, we can avoid killing the worker on failed server communications by adding the same logic. These failings come from network blips and timeouts mainly, and simply pausing should give the worker another chance at processing the build. This will limit "dropped" builds, and — if the admins choose worker registration — unregistered workers.

Example pipeline:

steps:
  - name: sleep
    image: alpine:latest
    commands:
      - sleep 30

  - name: goodbye
    image: alpine:latest
    commands:
      - echo "goodbye"

Testing process (shell):

$ make up
...
...

# [ send two builds to localhost:8080/webhook ]

# wait for sleep step to run for a bit

$ docker kill server

# wait a few seconds or so

$ docker restart server

The above will result in a complete loss of the build if the registration process is not enabled OR an unregistered worker if the registration process is enabled. Either way, this isn't ideal.

Now, with this change, the worker will process the second build so long as the server isn't down for a long time.

@ecrupper ecrupper requested a review from a team as a code owner August 14, 2024 16:31
Copy link

codecov bot commented Aug 14, 2024

Codecov Report

Attention: Patch coverage is 0% with 83 lines in your changes missing coverage. Please review.

Project coverage is 57.44%. Comparing base (dd70412) to head (0d846bf).
Report is 1 commits behind head on main.

Files Patch % Lines
cmd/vela-worker/exec.go 0.00% 53 Missing ⚠️
cmd/vela-worker/register.go 0.00% 30 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #594      +/-   ##
==========================================
- Coverage   57.91%   57.44%   -0.47%     
==========================================
  Files         120      120              
  Lines        5014     5055      +41     
==========================================
  Hits         2904     2904              
- Misses       1907     1948      +41     
  Partials      203      203              
Files Coverage Δ
cmd/vela-worker/register.go 0.00% <0.00%> (ø)
cmd/vela-worker/exec.go 0.00% <0.00%> (ø)

Copy link
Contributor

@KellyMerrick KellyMerrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ecrupper ecrupper merged commit 66403e5 into main Aug 26, 2024
11 of 13 checks passed
@ecrupper ecrupper deleted the fix/worker-resiliency-exec-loop branch August 26, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants